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Abstract 

We analyze different methods of sorting and selecting a set of objects by their in- 
trinsic value, via pairwise comparisons whose outcome is uncertain. After discussing 
the limits of repeated Round Robins, two new methods are presented: The ran-fil 
requires no previous knowledge on the set under consideration, yet displaying good 
performances even in the least favorable case. The min-ent method sets a benchmark 
for optimal dynamic tournaments design. 
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Introduction 



In the Internet era the amount of available information is overwhelming: the 
problem of finding and selecting the most relevant becomes, therefore, crucial. 
Each selection operation is noisy, yet we want to estimate the minimal amount 
of resources that is necessary to sort a large number of items. Our study is 
a first step to establish a firm theoretical bound for the new Information 
Theory introduced in [1], which aims to give theoretical basis for evaluating 
and improving the current and future search engines. 

The method of paired comparisons has been extensively studied by statisti- 
cians [2] and widely applied in various fields. Usually one has to rank N objects 
(or agents), each one of them endowed with a scalar quality qi,i = 1, 2, N, 
on the basis of a finite number of binary comparisons t c . The "true" rank Ri 
of item i is assumed to be zero if qi is the highest quality, one for the second 
highest and so forth. Upon assuming an a priori probability distribution of 
quality 0(g) and a given probability distribution of outcomes of comparisons 
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between two objects of known quality pij, it is possible to write the joint like- 
lihood of the outcomes as a function of individual qualities. The best guess 
for the quality set is the one that maximizes the likelihood, i.e. the one that 
would produce the given outcome with the highest probability. 

On the other hand scarce attention has been devoted so far to the problem of 
designing optimal tournaments with a limited number of games. Round Robin 
(RR) tournaments, for instance, are common but not very effective, because 
they assume that all comparisons are equally useful. In fact one might be more 
interested in the upper part of the classment than in its lower end and the 
result of some comparisons could be foreseen with high precision on the basis 
of previous outcomes. This fact motivates the filters we shall develop in the 
present paper. Let us first discuss the theoretical limits of repeated RRs with 
the following example. 



1 Best selection via Round Robins 

We want to select the best out of iV = N(0) objects, by successive elimination, 
in k c rounds. Round k is completed once a RR among all the (surviving) N(k) 
objects is performed; after each round A(k) objects are eliminated. Thus 

N(k + 1) = N(k) — AN(k). 

If the elimination were perfectly effective, after k rounds the quality of the 
selected objects would be uniformly distributed in the range (1 — ^(k), 1), 
where j(k) oc N{k)/N. 

Consider now two objects % and j, with a difference in mutual preference prob- 
abilities 6ij = \pij — Pji\, that are compared v times. Their average difference 
in points will be of order ve±yj~v. In many models of interest, like the Bradley 
Terry (BT) model introduced below (4), the average e of e i: j in the quality do- 
main (1 — 7, 1) is proportional to 7. In this case, in order to have a significant 
separation of two agents, they have to play at least 

v ~ l/e 2 (k) oc 1/ 7 2 (A;) oc N 2 /N 2 (k) (1) 

games on average. In addition to that, we should keep in mind that each object 
is compared with the remaining N — 1 in the first round. If we want to keep 
a constant selecting power, another factor N/N(k) has then to be considered 
to account for the diminishing number of opponents. In all, each object needs 
be compared v ~ N 3 /N 3 (k) times before deciding if it survived round k. 

Now suppose that, at round zero, we eliminated a percent of the original N 
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objects. At round k we shall eliminate the same a percent of the remaining 
N(k) objects only in about v rounds. Thus 



AN(k) oc a 



N(k) 



N 



If we take the continuum limit and solve the resulting differential equation, 
this yields 



The minimum number n c of comparisons needed for the champion to arise can 
be calculated by integrating (2) from to k c oc iV 2 . The result reads 



In the remaining of this paper we will propose two new procedures intended 
to improve this result. 



2 The ran-fil method 

We shall introduce an algorithm that can be used for finding an object with a 
large quality, i.e. a highly ranked one, with no previous knowledge of 0(g) and 
Pij . Inspired by a previous work [1], it is constructed in terms of rounds. In 
each round we line up objects labeled with numbers 1, . . . ,N (N assumed to 
be even) and compare all the odd numbered ones with their successors of even 
numbers: 1 with 2, 3 with 4 and so on. A round ends once items N —1 and N 
have been compared. Winners replace losers: if the object at site % is preferred 
over that at i + 1, we replace the latter with i, and vice versa. Before a new 
round starts all objects are reshuffled. We define the time t as the number of 
pairwise comparisons made (excluding cases where an agent is compared to 
himself). Table 1 shows an example of a run of our algorithm where, after k 
rounds, q§ is our designed winner, i.e. our guess for the agent with the best 
quality. 

Using the above approach, however, there is a quite large probability of losing 
the item with the highest quality in one of the initial rounds and therefore end 
up with a poor average winner's rank R w . In order to avoid this we introduce 
some noise such that, in each round we reintroduce objects which have been 
eliminated in earlier rounds. The noise level we use is fixed and set by the 
number of agents rj we re-introduce in each round. Agents gain one point for 
each time they have been preferred. Finally we estimate the agent with the 
best rank, for a given time step t c , as the one who gained more points. 



N(t) = 2aN/Vi. 



(2) 



n c oc N 2 log N. 



(3) 
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Table 1 

Example of a run of the ran-fil algorithm with no noise. 
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Fig. 1. The average rank (R w ) of the winning agent versus the number of compar- 
isons in the run- filter method, for various values of N (shown next to each line). 
The rank is averaged over 10000 runs and the noise level is put such that every 
second round an agent is re-introduced, i.e. n = .5. The dashed line is proportional 
to l/Vt. 

2.1 Testing the run-filter on the BT model 



Let us now focus on the model 

Pi,j = (4) 
Qi + Qj 



first proposed by Zermelo [4] but often referred to as Bradley- Terry (BT) 
model [5]. We tested the run- filter method on a population of agents whose 
qualities were uniformly distributed between zero and one. Fig. 1 shows the 
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average rank of the winner as a function of the total number of comparisons 
t c , for different values of N and with a reintroduction level 77 = 0.5. For the 
first few rounds our algorithm is similar to a knock-out tournament, i.e. the 
number of teams in each round is halved. Note that the ranking after some 
transient shows a characteristic scaling in the number of comparisons t c oc R^, 
and that the scaling exponent j3 seems to be independent of the number of 
agents. 

In the BT model the difference in mutual preference probabilities of two agents 
with qualities a and a — <5 (1 ^> <5 > 0) is 

a a — 5 5 ^ /r 9 X ,_ N 



This means that when the surviving agents all have qualities close to one, their 
average difference in e scales almost linearly with S and equation (1) holds. In 
the large N limit we can thus write down a differential equation for e(t): 



where a is a constant. The solution is explicitly written as 

*) - 4^. (7, 



It now follows that the estimated rank, R w (t), can be approximated by 




In fig. 1 we have added a line with a scaling exponent of (3 — —0.5 in the 
number of comparisons. The numerical examples are after some transient in 
excellent agreement with the predicted scaling (7). 

The ran-fil algorithm is directly applicable when you want to estimate the 
complete ranking table. In this case we define e as 

1 N 

e(t) = -^IMt) - R,\, 

where Ri(t) is the estimated rank of agent i at time t and Ri its true rank. 
Following the same argument as above we get the same scaling exponent. The 
prefactor, however, is larger. 
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Note that the noise is defined as simple as possible without any reference to 
an underlying distribution of p'^s. One could, based on a priori knowledge of 
such a distribution, tune the noise to increase the performance of our filter. 
Below we show that tuning of the noise can improve performance dramatically. 



3 The min-ent method 

We shall outline here a method in which we assume that the fitness distribution 
4>(q) and the functional form of pij = /(<&, qj) are known a-priori. In real life it 
is very rarely the case, unless these quantities can be reliably estimated from 
data collected in the past. In some sports, for example, previous championships 
could provide such data. Further inquiry is needed to check how robust this 
method is with respect to errors in the existing knowledge. Similar problems 
are widely dealt with in the literature and so we shall not tackle this question 
here. 

The idea of our method is to choose the couples to compare dynamically 
during the tournament. At every time step a comparison x(t) is performed 
and its outcome w(t) recorded. By time step t each couple has played 
Tiij(t) G (0, t) games. Let us denote by Wij(t) the number of times % has beaten 
j at time t, and by W(t) = ((wij)) the matrix of results collected until time 
t. It follows that Wij{t) + Wj t i(t) = n it j(t) for each pair. 

Let us now focus on the problem of finding only the best item. Without any 
prior information it is natural to draw the couples to compare in the first round 
from a uniform distribution. Once matrix W(t) is connected, though, we can 
adapt such a distribution so as to maximize the acquisition of new information. 
The information provided by a new comparison x u>v can be quantified as 

I wi t){x) = 1 - H (i/>\W(t) U x UtV ) /H max , (8) 

where H (ip\W (t)Ux UjV ) is the entropy of the conditional distribution ip(k\W(t)\J 
x UjV ), the probability that site k has the highest fitness, given the matrix of 
outcomes till time t plus a new comparison x u>v . Ideally one would like such a 
probability distribution to be as picked as possible around the most probable 
value, which translates into information maximization. This is a general pro- 
cedure, but different models require specific definitions of the expected condi- 
tional information and of the weights characterizing the importance sampling 
one wishes to apply [6]. 

Here we shall test the procedure of comparing the maximizing couple at each 
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time step, i.e. 

x(t + 1) = Arg max 1 w(t)(%) (9) 



with the following definition of ip: 

^(k\W{t) U x UtV ) = p u , v ^(k\W(t) U w u , v ) + p v , u tl>(k\W(t) U w V:U ). 

This corresponds to taking the expected value of unknown outcomes. Exten- 
sion to the determination of the entire rank or part of it can be easily found 
with the same reasoning. 



3.1 Finding the best item in a test model 



We test the outlined method on the model [7] 



Pbj = n > 0,3 ^ h 

Pij = 0.5,i, j = l,2,...,N;i,j ^ b, 

(10) 

which is the least favorable among many instances [9] and analytically solvable 
in the case of Round Robins [2]. Here, clearly, the assumption on a-priori 
distributions translates into known win probabilities. Thus 

ijj(k\W(t)) oc 7r Wfc (l - 7r) nfc - Wfc , 

where n k = £j n kJ and w k = Ej Wk,j- 

We proceed as follows: at least N comparisons are previously made in such a 
way that the matrix of outcomes be connected. Then, at each time step t, we 
compute, for all possible couples (u,v), the conditional entropy 

H(i\w(+\\\ \ - H (^\ W ( t )) + ~ a^u^og^u + j> v \ogi> v ) - q^loga^Qn + i/j v ) 

n yip\vv \ t) U X uv ) , 11 ) 

(i - (V>« + w(i - a *)) 



where a n — 2 [1 — 2n(l — 7r)] and ip y stands for i/j(y\W(t)). Next time step we 
shall compare the couple x(t + 1) satisfying condition (9). 

We assign J2j^k w k,j/ n k,j, points to item k and declare the winner as the one 
that collected more points at time t c . Then we check if our guess is right. 
Notice that, although the above rule is arbitrary, any other one would not 
improve notably our results [2] . Even maximum likelihood estimations, which 
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Fig. 2. Information (12) (squares) and percentage r w of games played by the fittest 
agent (circles) over time in model (10). 




3 10 30 100 300 

N 

Fig. 3. Average number of games needed to find the best object with probability 0.6 
in model (10) with tt = 0.60 in the upper and ir = 0.55 in the lower panel respec- 
tively. Symbols report simulation results for a repeated Round Robin tournament 
(triangles) and for the min-ent method (circles) . The dashed lines are for the ran-fil 
with T] = 1,7, where rj = 7 gives the best performance for large N. In the upper 
panel the min-ent method is fitted with a linear fit, the robin tournament with a 
power law fit with exponent 1.5. In the lower panel the fitting exponents are 0.82 
respectively 1.25. 



are the best ones under our hypothesis, give the same ranking as ours in the 
case of Round Robins [4,8] yet involving much heavier calculations. 

We tested the min-ent method on the model (10). First we verified that it 
converges, i.e. entropy (11) goes to zero. In figure 2 we show that it is actually 
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the case: the information gain approaches 1, 
I w{t) = l-H^\W(t))/H max ^l, 



(12) 



as we employ more resources (t — > oo), and so does the percentage r n of games 
played by the fittest agent. Then we compared the performance of the min-ent 
method with that of the ran-fil method and with repeated RRs; results are 
shown in figure 3. The total number of comparisons t c needed to pick the best 
item with a given probability (0.6 in the figure) seems to scale linearly with the 
number of items N using the min-ent method, while it is a super-linear power 
law for RRs. The ran-fil method, although less performing than the min-ent, 
clearly outperforms Round Robins. This last result is particularly promising, 
since it is widely applicable to real filters. 



Conclusions 



We have analyzed different methods of selecting a set of N objects by means of 
n c pairwise comparisons. In particular we focused on the amount of resources 
one has to spend in order to select the fittest. We stated that the best one 
can obtain with repeated round robins with successive elimination, for a wide 
variety of probabilistic models, is n c oc N 2 log N. Then we introduced two 
new methods of performing the selection. Both ones give better results than 
repeated RRs in a worst-case test model: with the min-ent, based on informa- 
tion maximization, we obtained n c oc N. The ran-fil method has been shown 
to outperform round robins without requiring any previous knowledge of the 
underlying model. Its basic principles are to keep an almost constant select- 
ing power at each round and to gradually eliminate losers. We believe they 
constitute a useful proposal for improving tournament design, with particular 
reference to ranking methods of Internet Search Engines. 
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